Goto

Collaborating Authors

 comprehensive attention self-distillation



Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Neural Information Processing Systems

Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images. CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO.


Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection Supplementary Material

Neural Information Processing Systems

Bottom: CASD overlaid with attentions. Recall that WSOD conducts classification on object proposals (e.g., bounding boxes generated by Selective Search [ Figure 1 shows both the success and the failure cases of CASD. This could be improved by hard-sample mining in CASD training. This localization advantages of CASD benefit from its learning of comprehensive attention (see the bottom row of Figure 1). CorLoc only evaluates the localization accuracy of detectors.



Review for NeurIPS paper: Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Neural Information Processing Systems

I am worried that this may also be the case for WSOD. I observe many bad signals in this paper. For example, there are many, many hyperparameters in this method.


Review for NeurIPS paper: Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Neural Information Processing Systems

This paper proposes a Comprehensive Attention Self-Distillation (CASD) training method for tackling weakly-supervised object detection. The method is empirically assessed on COCO and PASCAL VOC benchmarks where it shows impressive performance. A common concern across all reviews was the issue with number of hyper-parameters of the method. Reviewers found detailed authors' response very helpful and some increased their initial scores. Another common concern, but less critical, was about writing, which can be improved for the final version.


Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Neural Information Processing Systems

Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images.